Developing Guidelines for the Annotation of Anaphors in the Chinese Treebank
نویسنده
چکیده
This paper describes the CTB Coreference Annotation Guidelines for annotating pronominal anaphoric expressions in the Penn Chinese Treebank. The goals of the annotation are: to provide training data for learning-based pronoun resolution tools, and to provide a \gold" standard to be used in the evaluation of pronoun resolution algorithms. The choices that were made concerning the coindexing of pronominal anaphors and their antecedents are discussed, as are some questions that arose in trying to categorize those pronominal expressions that did not refer to speci c nominal entities in the text.
منابع مشابه
An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملAutomatic Adaptation of Annotations
Manually annotated corpora are indispensable resources, yet for many annotation tasks, such as the creation of treebanks, there exist multiple corpora with different and incompatible annotation guidelines. This leads to an inefficient use of human expertise, but it could be remedied by integrating knowledge across corpora with different annotation guidelines. In this article we describe the pro...
متن کاملIterative Transformation of Annotation Guidelines for Constituency Parsing
This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training d...
متن کاملThe Penn Chinese TreeBank: Phrase structure annotation of a large corpus
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora with di erent segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefor...
متن کاملThe Bracketing Guidelines for the Penn Chinese Treebank (3.0)
This document describes the bracketing guidelines for the Penn Chinese Treebank Project. The goal of the project is the creation of a 100-thousand-word corpus of Mandarin Chinese text with syntactic bracketing. The Chinese Treebank has been released via the Linguistic Data Consortium (LDC) and is available to the public. This document can be divided into six parts. Section I discusses six funda...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002